MATAM: reconstruction of phylogenetic marker genes from short sequencing reads in metagenomes Supplementary Material

نویسندگان

  • Pierre Pericard
  • Yoann Dufresne
  • Loïc Couderc
  • Samuel Blanquart
  • Hélène Touzet
چکیده

MATAM default SSU-RNA reference database is built using the Silva 128 SSU Ref NR99 database [10], comprised of 645 151 procaryotic 16S rRNA sequences (https://www.arb-silva.de/fileadmin/silva_ databases/release_128/Exports/SILVA_128_SSURef_Nr99_tax_silva_trunc.fasta.gz). Sequences with consecutive chunks of unknown nucleotides (N) larger that 5 nucleotides are filtered out and remaining Ns are replaced with A nucleotides, yielding 642 903 sequences. The filtered reference database is finally clustered with Sumaclust [7, 4] using semi-global alignment and a 95% identity threshold All those steps can be performed using the provided script: matam_db_preprocessing.py from the GitHub repository (https://github.com/bonsai-team/matam)

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MATAM: reconstruction of phylogenetic marker genes from short sequencing reads in metagenomes

Motivation Advances in the sequencing of uncultured environmental samples, dubbed metagenomics, raise a growing need for accurate taxonomic assignment. Accurate identification of organisms present within a community is essential to understanding even the most elementary ecosystems. However, current high-throughput sequencing technologies generate short reads which partially cover full-length ma...

متن کامل

FragGeneScan: predicting genes in short and error-prone reads

The advances of next-generation sequencing technology have facilitated metagenomics research that attempts to determine directly the whole collection of genetic material within an environmental sample (i.e. the metagenome). Identification of genes directly from short reads has become an important yet challenging problem in annotating metagenomes, since the assembly of metagenomes is often not a...

متن کامل

Genovo: De Novo Assembly for Metagenomes

Next-generation sequencing technologies produce a large number of noisy reads from the DNA in a sample. Metagenomics and population sequencing aim to recover the genomic sequences of the species in the sample, which could be of high diversity. Methods geared towards single sequence reconstruction are not sensitive enough when applied in this setting. We introduce a generative probabilistic mode...

متن کامل

Transcriptome analysis of the freshwater pearl mussel, Hyriopsis cumingii (Lea) using illumina paired-end sequencing to identify genes and markers

The transcriptome of triangle sail mussel Hyriopsis cumingii (Lea) using Illumina paired-end sequencing technology was conducted and analyzed. Equal quantities of total RNA isolated from six tissues, including gonad, hepatopancreas, foot, mantel, gill and adductor muscle, were pooled to construct a cDNA library. A total of 58.09 million clean reads with 98.48 % Q20 bases were generated. Cluster...

متن کامل

Accurate binning of metagenomic contigs via automated clustering sequences using information of genomic signatures and marker genes.

Metagenomics, the application of shotgun sequencing, facilitates the reconstruction of the genomes of individual species from natural environments. A major challenge in the genome recovery domain is to agglomerate or 'bin' sequences assembled from metagenomic reads into individual groups. Metagenomic binning without consideration of reference sequences enables the comprehensive discovery of new...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017